Keep walproposer alive until shutdown checkpoint is safe on safekepeers #6712

hlinnaka · 2024-02-09T20:02:57Z

The walproposer pretends to be a walsender in many ways. It has a WalSnd slot, it claims to be a walsender by calling MarkPostmasterChildWalSender() etc. But one different to real walsenders was that the postmaster still treated it as a bgworker rather than a walsender. The difference is that at shutdown, walsenders are not killed until the very end, after the checkpointer process has written the shutdown checkpoint and exited.

As a result, the walproposer always got killed before the shutdown checkpoint was written, so the shutdown checkpoint never made it to safekeepers. That's fine in principle, we don't require a clean shutdown after all. But it also feels a bit silly not to stream the shutdown checkpoint. It could be useful for initializing hot standby mode in a read replica, for example.

Change postmaster to treat background workers that have called MarkPostmasterChildWalSender() as walsenders. That unfortunately requires another small change in postgres core.

After doing that, walproposers stay alive longer. However, it also means that the checkpointer will wait for the walproposer to switch to WALSNDSTATE_STOPPING state, when the checkpointer sends the PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in walproposer to receive and handle that signal reliably. Instead, we mark walproposer as being in WALSNDSTATE_STOPPING always.

In commit 568f914, I assumed that shutdown will wait for all the remaining WAL to be streamed to safekeepers, but before this commit that was not true, and the test became flaky. This should make it stable again.

Closes: #559

github-actions · 2024-02-09T20:04:57Z

2502 tests run: 2380 passed, 0 failed, 122 skipped (full report)

Flaky tests (1)

Postgres 15

test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

functions: 28.8% (7031 of 24443 functions)
lines: 47.6% (43441 of 91295 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
ba204b0 at 2024-03-11T19:08:26.723Z :recycle:}

MMeent

I can't check the changes made in the various PG versions due to what seems to be GitHub UI error, but looks OK.

arssher · 2024-02-10T04:39:19Z

I can't check the changes made in the various PG versions due to what seems to be GitHub UI error

Looks like PG change push is missing in the repo.

arssher · 2024-02-10T05:11:30Z

So I can't yet check out the core changes, but I wonder why postmaster treated walproposer as bgw prevously. Looking at SignalSomeChildren/CountChildren it checks only bkend_type, which is set to BACKEND_TYPE_WALSND soon after MarkPostmasterChildWalSender...

hlinnaka · 2024-02-10T10:01:21Z

Looks like PG change push is missing in the repo.

Created branches for those and pushed, should be visible now.

So I can't yet check out the core changes, but I wonder why postmaster treated walproposer as bgw prevously. Looking at SignalSomeChildren/CountChildren it checks only bkend_type, which is set to BACKEND_TYPE_WALSND soon after MarkPostmasterChildWalSender...

The postgres changes show that, but in a nutshell, because walproposer is marked as BACKEND_TYPE_BGWORKER, postmaster never calls IsPostmasterChildWalSender() on it. Postmaster expects that a BACKEND_TYPE_NORMAL process can morph into a walsender, but not a bgworker.

arssher · 2024-02-12T11:11:09Z

There several more things to do:

Check currently is in wrong place, it won't fire until latch is set.
We need to wait not only for flush_lsn on the quorum, but commit_lsn -- safekeepers don't advance commit position in the absense of compute.
Also commit_lsn should be not only recorded, but persisted on safekeepers (otherwise their restart would pull it back); they do that periodically, but we want here that ASAP.
Add test.

I'll try to finish it.

arssher · 2024-02-12T12:18:58Z

Interesting, locally it already works without any fixes. Checked, this is because after checkpoint is flushed postmaster kills walproposer with SIGUSR2, which sets the latch, and walproposer manages to commit (and send commit_lsn) to safekeeper before this.

jcsp · 2024-02-14T09:54:07Z

In #6751 I'm reverting the previous change that appears to have made test_pg_regress less stable -- would probably make sense to revert the revert in this PR, along with the walproposer changes. Hopefully together they make the test stable.

…t flaky" (#6751) The #6666 change appears to have made the test fail more often. PR #6712 should re-instate this change, along with its change to make the overall flow more reliable. This reverts commit 568f914.

jcsp · 2024-02-14T13:22:51Z

In case this is useful: a failure on a branch where I had an explicit wait_for_flush_lsn before compute stop and logged the before + after LSNs, where it looks like the safekeeper has seen the post-stop LSN, but the pageserver hasn't #6752 (comment)

PR #6712 might add a delay in graceful shutdown without change to compute -> sk protocol to force sync of control file on safekeeper to persist commit_lsn, so use 'immediate' mode instead; 'fast' one isn't useful for neon anyway.

arssher · 2024-02-15T15:28:12Z

There several more things to do:

Mostly did them, now need to merge PG PRs and then this one. @hlinnaka please have a look, I can't request your review :)

The walproposer pretends to be a walsender in many ways. It has a WalSnd slot, it claims to be a walsender by calling MarkPostmasterChildWalSender() etc. But one different to real walsenders was that the postmaster still treated it as a bgworker rather than a walsender. The difference is that at shutdown, walsenders are not killed until the very end, after the checkpointer process has written the shutdown checkpoint and exited. As a result, the walproposer always got killed before the shutdown checkpoint was written, so the shutdown checkpoint never made it to safekeepers. That's fine in principle, we don't require a clean shutdown after all. But it also feels a bit silly not to stream the shutdown checkpoint. It could be useful for initializing hot standby mode in a read replica, for example. Change postmaster to treat background workers that have called MarkPostmasterChildWalSender() as walsenders. That unfortunately requires another small change in postgres core. After doing that, walproposers stay alive longer. However, it also means that the checkpointer will wait for the walproposer to switch to WALSNDSTATE_STOPPING state, when the checkpointer sends the PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in walproposer to receive and handle that signal reliably. Instead, we mark walproposer as being in WALSNDSTATE_STOPPING always. In commit 568f914, I assumed that shutdown will wait for all the remaining WAL to be streamed to safekeepers, but before this commit that was not true, and the test became flaky. This should make it stable again. Some tests wrongly assumed that no WAL could have been written between pg_current_wal_flush_lsn and quick pg stop after it. Fix them by introducing flush_ep_to_pageserver which first stops the endpoint and then waits till all committed WAL reaches the pageserver. In passing extract safekeeper http client to its own module.

This test occasionally fails with a difference in "pg_xact/0000" file between the local and restored datadirs. My hypothesis is that something changed in the database between the last explicit checkpoint and the shutdown. I suspect autovacuum, it could certainly create transactions. To fix, be more precise about the point in time that we compare. Shut down the endpoint first, then read the last LSN (i.e. the shutdown checkpoint's LSN), from the local disk with pg_controldata. And use exactly that LSN in the basebackup. Closes #559

hlinnaka requested a review from a team as a code owner February 9, 2024 20:02

hlinnaka requested review from conradludgate, MMeent and arssher and removed request for a team and conradludgate February 9, 2024 20:02

MMeent reviewed Feb 9, 2024

View reviewed changes

hlinnaka force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 11b473a to ddfd4ef Compare February 10, 2024 10:01

hlinnaka mentioned this pull request Feb 12, 2024

Add neon.primary_is_running GUC. #6705

Merged

This was referenced Feb 13, 2024

check_restored_datadir_content failure in pg_regress #559

Closed

Revert #6666 "tests: try to make restored-datadir comparison tests not flaky" #6751

Merged

arssher mentioned this pull request Feb 14, 2024

Stop postgres in immediate mode in VM image. #6763

Open

arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from ddfd4ef to 3b09bd1 Compare February 15, 2024 15:23

arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch 3 times, most recently from a554700 to 98315c8 Compare February 15, 2024 15:52

arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 98315c8 to 6f2ae2b Compare March 11, 2024 08:10

hlinnaka requested a review from a team as a code owner March 11, 2024 08:10

arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 6f2ae2b to 09cc433 Compare March 11, 2024 08:15

arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 09cc433 to d22a092 Compare March 11, 2024 08:25

arssher approved these changes Mar 11, 2024

View reviewed changes

arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from d22a092 to 72bf53b Compare March 11, 2024 09:04

arssher requested a review from knizhnik March 11, 2024 09:05

knizhnik approved these changes Mar 11, 2024

View reviewed changes

arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch 2 times, most recently from af08861 to 47022a6 Compare March 11, 2024 18:01

hlinnaka added 2 commits March 11, 2024 21:23

arssher force-pushed the send-shutdown-checkpoint-to-safekeepers branch from 47022a6 to ba204b0 Compare March 11, 2024 18:23

arssher enabled auto-merge (rebase) March 11, 2024 19:00

arssher merged commit 621ea2e into main Mar 11, 2024
53 checks passed

arssher deleted the send-shutdown-checkpoint-to-safekeepers branch March 11, 2024 19:29

koivunej mentioned this pull request Mar 12, 2024

test_empty_tenant_size is flaky #6965

Closed

arssher mentioned this pull request Oct 4, 2024

Investigate safekeeper eviction errors #8758

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep walproposer alive until shutdown checkpoint is safe on safekepeers #6712

Keep walproposer alive until shutdown checkpoint is safe on safekepeers #6712

hlinnaka commented Feb 9, 2024 •

edited by jcsp

Loading

github-actions bot commented Feb 9, 2024 •

edited

Loading

Postgres 15

MMeent left a comment

arssher commented Feb 10, 2024

arssher commented Feb 10, 2024

hlinnaka commented Feb 10, 2024

arssher commented Feb 12, 2024

arssher commented Feb 12, 2024

jcsp commented Feb 14, 2024

jcsp commented Feb 14, 2024

arssher commented Feb 15, 2024

Keep walproposer alive until shutdown checkpoint is safe on safekepeers #6712

Keep walproposer alive until shutdown checkpoint is safe on safekepeers #6712

Conversation

hlinnaka commented Feb 9, 2024 • edited by jcsp Loading

github-actions bot commented Feb 9, 2024 • edited Loading

2502 tests run: 2380 passed, 0 failed, 122 skipped (full report)

Postgres 15

Code coverage* (full report)

MMeent left a comment

Choose a reason for hiding this comment

arssher commented Feb 10, 2024

arssher commented Feb 10, 2024

hlinnaka commented Feb 10, 2024

arssher commented Feb 12, 2024

arssher commented Feb 12, 2024

jcsp commented Feb 14, 2024

jcsp commented Feb 14, 2024

arssher commented Feb 15, 2024

hlinnaka commented Feb 9, 2024 •

edited by jcsp

Loading

github-actions bot commented Feb 9, 2024 •

edited

Loading